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ABSTRACT 

The  DENDRAL  and  Heta-DENDRAL  programs  assist  chemists  with  data  interpretation 
problems.  The  design  of  each  program  Is  described  In  the  context  of  the 
chemical  inference  problems  the  program  solves.  Some  chemical  results  produced 
by  the  programs  are  mentioned. 
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1 INTRODUCTION 


Ihe  DENDRAL  and  Meta-KNDRAL  programs  cure  products  of  a leurge# 
interdisciplinary  group  of  Stanford  University  scientists  concerned 
with  many  and  highly  varied  aspects  of  the  mechanization  of  scientific 
reasoning  and  the  formalization  of  scientific  knowledge  for  this 
purpo%.  An  early  motivation  for  our  work  was  to  explore  the  power  of 
existing  AI  methods,  such  as  heuristic  search,  for  reasoning  in 
difficult  scientific  problems  [7].  Another  concern  has  been  to  exploit 
the  AI  methodology  to  understand  better  some  fundamental  questions  in 
the  philosophy  of  science,  for  example  the  processes  by  ^Aich 
explanatory  hypotheses  are  discovered  or  judged  adequate  [18] . From 
the  start,  the  project  has  had  an  applications  dimension  [9,10,27].  It 
has  sought  to  develop  "expert  level"  agents  to  assist  in  the  solution 
of  problems  in  their  discipline  that  require  complex  symbolic 
reasoning.  The  applications  dimension  is  the  focus  of  this  paper. 

In  order  to  achieve  high  performance,  the  DENDRAL  programs 
incorporate  large  amounts  of  knowledge  about  the  area  of  science  to 
which  they  are  applied,  structure  elucidation  in  organic  chemistry.  A 
"smart  assistant"  for  a chemist  needs  to  be  able  to  perform  many  tasks 
as  well  as  an  expert,  but  need  not  necessarily  understand  the  domain  at 
the  same  theoretical  level  as  the  expert.  The  over-all  structure 
elucidation  task  is  described  below  (Section  2)  followed  by  a 
description  of  the  role  of  the  DENDRAL  programs  within  that  framework 
(Section  3) . The  Meta-DENDRAL  programs  (section  4)  use  a weaker  body 
of  knowledge  about  the  domain  of  mass  spectrometry  because  their  task 
is  to  formulate  rules  of  mass  spectrometry  by  induction  from  en^jirical 
data.  A strong  model  of  the  domain  would  bias  the  rules  unnecesseurily. 


1.1  Historical  Perspective 

■nie  DEMDRAL  project  began  in  1965.  Then,  as  now,  we  were 
concerned  with  the  conceptual  problems  of  designing  and  writing  syntool 
manipulation  programs  that  used  substantial  bodies  of  domain-  specific 
scientific  knowledge.  In  contrast,  this  was  a time  in  the  history  of 
AI  in  which  most  laboratories  were  working  on  general  problem  solving 
methods,  e.g.,  in  1965  work  on  resolution  theorem  proving  was  in  its 
prime. 


■Hie  programs  have  followed  an  evolutionary  progression. 
Initial  concepts  were  translated  into  a working  program:  the  program 
was  tested  and  improved  by  confronting  simple  test  cases;  and  finally  a 
production  version  of  the  program  including  user  interaction  facilities 
was  released  for  real  applications.  This  intertwining  of  short-term 
pragmatic  goals  and  long-term  development  of  new  AI  science  is  an 
important  theme  throughout  our  research.  The  results  presented  here 


have  been  produced  by  IX34DRAL  progranns  at  various  stages  of 
development. 


2 THE  GENERAL  NATURE  OF  THE  APPLICATIONS  TASKS 


2.1  Structure  Elucidation 

Ihe  application  of  chemical  knowledge  to  elucidation  of 
molecular  structures  is  fundamental  to  understanding  in^rtant  problems 
of  biology  and  medicine.  Areas  in  which  we  and  our  collaborators 
maintain  active  interest  include:  a)  identification  of  natural  products 
isolated  from  terresticd  or  marine  sources,  particularly  those  products 
v^ich  demonstrate  biological  activity  or  v^ich  cire  key  intermediates  in 
biosynthetic  pathways;  b)  verification  of  the  identity  of  new  synthetic 
materials;  c)  identification  of  drugs  and  their  metabolites  in  clinical 
studies;  and  d)  detection  of  metabolic  disorders  of  genetic, 
developmentcd , toxic  or  infectious  origins  by  identification  of  organic 
constituents  excreted  in  abnormal  quantities  in  humcui  body  fluids. 

In  most  circiinstances,  especially  in  the  areas  of  interest 
sixnmarized  above,  chemists  are  faced  with  structural  problens  where 
direct  examination  of  the  structure  by  X-ray  crystallograjiiy  is  not 
possible.  In  these  circumstances  they  imist  resort  to  structure 
elucidation  based  on  data  obtained  from  a variety  of  pdiysical,  chanical 
and  spectroscopic  methods. 

This  kind  of  structure  elucidation  involves  a sequence  of  steps 
that  is  roughly  approximated  by  the  following  scenario.  An  unknown 
structure  is  isolated  from  some  source.  The  source  of  the  sanf>le  and 
the  isolation  procedures  employed  already  provide  some  clues  as  to  the 
chemical  ocmstitution  of  the  compound.  A variety  of  chemical,  physical 
and  spectroscopic  data  are  collected  on  the  sample.  Interpretation  of 
these  data  yields  structured  hypotheses  in  the  form  of  functional 
groups  or  more  complex  molecular  fragments.  Assembling  these  fragments 
into  complete  structures  provides  a set  of  candidate  structures  for  the 
unknown.  These  candidates  are  examined  and  experiments  aure  designed  to 
differentiate  among  them.  The  experiments,  usually  collecting 
additional  spectroscopic  data  and  executing  sequences  of  chemical 
reactions,  result  in  new  structural  information  «^ich  serves  to  reduce 
the  set  of  candidate  structures.  Eventually  ax>ugh  Information  is 
inferred  from  experimental  data  to  constrain  the  candidates  to  the 
correct  structure. 

As  long  as  time  permits  and  the  number  of  unknown  structures  is 
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amall,  a manual  approach  will  usually  be  successful,  as  it  has  been  in 
the  past.  However,  the  manual  approach  is  amenable  to  a high  degree  of 
computer  assistance,  which  is  increasingly  necessary  for  both  practical 
and  scientific  reasons.  One  need  only  examine  current  regulatory 
activities  in  fields  related  to  chemistry,  or  the  rate  at  which  new 
compounds  are  discovered  or  synthesized  to  gain  a feeling  for  the 
^actical  need  for  rapid  identification  of  new  structures.  More 
important,  however,  is  the  ccmtribution  such  conpijter  aissistance  can 
make  to  scientific  creativity  in  structure  elucidation  in  particular, 
and  chemistry  in  general,  by  providing  new  tools  to  aid  scientists  in 
hypothesis  formation,  ihe  automated  approaches  discussed  in  this  paper 
provide  a systematic  procedure  for  verifying  hypotheses  about  chemical 
structure  and  ensuring  that  no  plausible  edternatives  have  been 
overlooked . 


2.2  Structure  Elucidation  with  Constraints  from  M2iss 
Spectrometry 

The  Heuristic  DENDRAL  Program  is  designed  to  help  organic 
chemists  determine  the  moleculcu:  structure  of  unknown  conpounds.  Parts 
of  the  program  have  been  highly  tuned  to  work  with  experimental  data 
from  an  analytical  instrument  known  as  a mass  spectrometer.  Mass 
spectrotnetry  is  a new  cuid  still  developing  analytic  technique.  It  is 
not  ordinarily  the  only  analytic  technique  used  by  chemists,  but  is  one 
of  a broad  array,  including  nuclear  magnetic  resonance  (IMR) , inf rared 
(IR) , ultraviolet  (UV) , and  "wet  chemistry"  analyses.  Mass  spectrometry 
is  particularly  useful  when  the  quantity  of  the  sample  to  be  identified 
is  very  small,  for  it  requires  only  micrograms  of  sample. 

A mass  spectrometer  bombards  the  chemical  sample  with 
electrons,  causing  fragmentations  and  rearrangements  of  the  molecules. 

Charged  fragments  are  collected  by  mass.  Tlie  data  from  the  instrument, 
recorded  in  a histogram  knovvn  as  a mass  spectrum,  show  the  masses  of 
charged  fragments  plotted  against  the  relative  abundance  of  the 
fragments  at  a mass.  Although  the  mass  spectrum  for  each  molecule  may 
be  nearly  unique,  it  is  still  a difficult  task  to  infer  the  molecular 
structure  from  the  100-300  data  points  in  the  mass  spectrum.  The  data 
are  highly  redundant  because  molecules  fragment  along  different 
pathways.  Thus  tvro  different  masses  may  or  may  not  include  atoms  from 
the  same  part  of  the  molecules.  In  short,  the  theory  of  meiss 
spectrometry  is  too  inccmplete  to  allow  unambiguous  reconstruction  of 
the  structure  from  overlapping  fragments. 

Throughout  this  paper  we  will  use  the  following  terms  to  ' 

describe  the  actions  of  molecules  in  the  mass  spectrometer: 

Fragmentation  - the  breaking  of  a connected  i 
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1) 


graph  (nolecule)  into  fragments  by  breaking  one  or  more 
edges  (bonds)  within  the  graph. 

2)  Atom  migration  - the  detachment  of  nodes 

(atoms)  from  one  fragment  and  their  reattachment  to  a 
second  fragment.  This  process  alters  the  masses  of  both 
fragments. 

3)  Mass  spectral  process  (or  process)  -*  a 

fragmentation  followed  by  zero  or  more  atom  migrations. 


2.3  Strvx;ture  Elucidation  with  Constraints  from  Other  Data 

Other  analytic  techniques  are  coninonly  used  in  conjunction 
with,  or  instead  of,  mass  spectrometry.  Some  rudimentary  capabilities 
exist  in  the  DENDRAL  programs  to  interpret  proton  NMR  and  Carbon  13 
(IX)  IMR  spectra.  For  the  most  part,  however,  interpretation  of  other 
spectroscopic  euid  chemical  data  heis  been  left  to  the  chemist.  Ihe 
programs  still  need  to  be  able  to  integrate  the  chemist's  partial 
knowledge  into  the  generation  of  structured  alternatives. 


3 HEORISTIC  DENDRAL  AS  AN  INTBLLIGE^^'  ASSISTAOT 


3.1  Method 

Heuristic  DENDRAL  is  organized  as  a Plan  - Generate  - Test 
sequence.  This  is  not  necessarily  the  same  method  used  by  chemists, 
but  it  is  easily  understood  by  them.  It  oonplenents  their  methods  by 
providing  such  a meticulous  search  through  the  space  of  molecular 
structures  that  the  diemist  is  virtually  guaranteed  that  any  candidate 
structure  which  fails  to  £f)pear  on  the  final  list  of  plausible 
structures  has  been  rejected  for  explicitly  stated  cb^ical  reasons. 

The  three  main  parts  of  the  program  are  discussed  below, 
starting  with  the  generator  because  of  its  fundamental  in(X}rtance. 


3.1.1  The  Generator 

The  he2irt  of  a heuristic  search  program  is  a generator  of  the 
search  space.  In  a chess  playing  program,  for  example,  the  legal  move 
generator  completely  defines  the  ^>ace  of  moves  and  move  sequences.  In 
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Heuristic  OGNDRAL  the  legal  move  generator  is  baised  on  the  DEMDRAL 
algorithm  developed  by  J.  Lederberg  [1-4] . Ihis  adgorithro  specifies  a 
systematic  enumeration  of  molecular  structures.  It  treats  molecules  as 
planar  graphs  and  generates  successively  larger  gra^  structures  until 
all  chemical  atoms  cure  included  in  graphs  in  all  possible  eurrangements. 
Because  graphs  with  cycles  presented  special  problems/^'  initial  work 
was  limited  to  chemical  structures  without  rings  (with  the  exception  of 
[21]). 

Ibe  number  of  chemical  gre^s  for  molecular  formulas  of 
interest  to  chemists  can  be  extremely  large.  Ibus  it  is  essential  to 
constrain  structure  gei^ration  to  only  plausible  molecular  structures. 
Hie  CONGEN  program [44] , is  the  DENDRAL  hypothesis  generator  now  in 
use.  It  accepts  problsn  statements  of  (a)  the  number  of  atoms  of  each 
type  in  the  molecule  eind  (b)  constraints  on  the  correct  hypothesis,  in 
order  to  generate  all  chemical  graphs  that  fit  the  stated  constraints. 
Ihese  problem  statements  may  come  from  a chemist  interpreting  his  own 
experimented  data  or  from  a spectrometric  data  analysis  program. 

Ihe  purpose  of  CONGEN  is  to  assist  the  chemist  in  determining 
the  chemical  structure  of  an  unknown  con^und  by  1)  cdlowing  him  to 
sprcify  certain  types  of  structural  information  about  the  conpjund 
which  he  has  determined  from  any  source  (e.g.,  spectroscopy,  chemiced 
degredation,  method  of  isolation,  etc.)  and  2)  generating  an  exhaustive 
and  non-redundant  list  of  structures  that  ate  consistent  with  the 
information.  Hie  generation  is  a stepwise  process,  and  the  program 
allows  interaction  at  every  stage:  based  upon  partial  results  the 
chemist  may  be  reminded  of  additional  information  which  he  can  specify, 
thus  limiting  further  the  number  of  structurad  possibilities. 

CONGEN  breaks  the  problem  down  into  several  types  of 
subproblems,  for  example:  (i)  hydrogen  atoms  are  omitted;  (ii)  paurts  of 
the  graph  containing  no  cycles  are  generated  separately  from  cyclic 
parts  (and  combined  at  the  end) ; (iii)  cycles  containing  only  unnamed 
nodes  are  generated  before  labeling  the  nodes  with  names  of  chemical 
atoms  (e.g.,  carbon  or  nitrogen);  (iv)  cycles  containing  only  three- 
connected  (or  higher)  nodes  (e.g.,  nitrogen  or  tertiary  carbon)  are 
generated  before  mapping  two-connected  nodes  (e.g.,  oxygen  or  secondary 
carbon)  onto  the  e^es.  At  each  step  several  constraints  may  be 
applied  to  limit  the  number  of  emerging  chemical  graphs  [49] . 

At  the  heart  of  CCNGEN  are  two  algorithms  whose  validity  has 
been  mathematically  proven  and  whose  computer  implementation  has  been 


The  symmetries  of  cyclic  graphs  prevented  Exrospective 
avoidance  of  duplicates  during  generation.  Brown,  Hjelmeland  euid 
Masinter  solved  these  problems  in  both  theory  and  practice  [31,  36]. 

named  for  constrained  generator 
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viell  tested.  The  structure  generation  algorithm  [31,36,39,40]  is 
designed  to  determine  all  topologically  unique  ways  of  assenbllng  a 
given  set  of  atoms,  each  with  an  associated  valence,  into  moleculcu: 
structures.  Ihe  atoms  may  be  chemical  atoms  with  standard  chemical 
valences,  or  they  may  be  names  representing  molecular  fragments 
("superatoms")  of  any  desired  complexity,  where  the  valence  corresponds 
to  the  total  number  of  bonding  sites  available  idthln  the  superatom. 
Because  the  structure  generation  cdgorithm  can  produce  only  structures 
in  which  the  superatoms  appear  as  single  nodes  (we  refer  to  these  as 
intermediate  structures) , a second  procedure,  the  imbedding  algorithm 
[36,44]  is  needed  to  expand  the  superatoms  to  their  full  chemiccd 
identities. 

A substantial  amount  of  effort  has  been  devoted  to  modifying 
these  two  basic  procedures,  particularly  the  structure  generation 
algorithm,  to  accept  a variety  of  other  structural  information 
(constraints) , using  it  to  prune  the  list  of  structural  possibilities. 
Current  cap^ilities  include  specification  of  good  and  bad 
substructural  features,  good  and  bctd  ring  sizes,  proton  distributicxis 
and  connectivities  of  isoprene  units  [49] . Usuadly,  the  chemist  has 
additional  information  (if  only  some  genered  rules  about  chemical 
stability)  of  v4iich  the  program  has  little  knowledge  but  v^ich  can  be 
used  to  limit  the  number  of  structural  possibilities.  For  example,  he 
may  know  that  the  chemical  procedures  used  to  isolate  the  compound 
would  change  organic  acids  to  esters  and  thus  the  program  need  not 
consider  structures  with  unchanged  acid  groups.  Also,  he  is  given  the 
feicility  to  impart  this  knowledge  interactively  to  the  program. 

To  make  CONCSN  easy  to  use  by  research  chemists,  the  program 
has  been  provided  with  an  interactive  "front  end".  This  interface 
contains  EDITSTRUC,  an  interactive  structure  editor,  DRAW,  a teletype- 
oriented  structure  display  program  [58] , and  the  CONCSN  "executive" 
program  v4iich  ties  together  the  individual  subprograms,  such  as 
sul:programs  for  defining  superatoms  and  substructures,  creating  and 
editing  lists  of  constraints  or  superatoms,  and  saving  and  restoring 
superatoms,  constraints  and  structures  from  secondary  storage  (disc) . 
Ihe  resulting  system,  for  which  comprehensive  user-level  documentation 
has  been  prepared,  is  running  on  the  SUMEX  conputing  facility  at 
Stanford  and  is  available  nationwide  over  the  TYMNET  network  [46] . Ihe 
use  of  CCMGEN  by  chemists  doing  structure  elucidation  is  discussed  in 
section  3.4. 


3.1.2  The  Planning  Programs 

Although  CONGEN  is  designed  to  be  useful  as  a stand-alone 
package  some  assistance  can  also  be  given  with  the  task  of  inferring 
constraints  for  the  generator.  This  is  done  by  plarming  {xrograms  that 
analyze  instrument  data  and  infer  constraints  (sW  (10,22,28] ) . 


The  MMKIAL  Planner  uses  a large  amount  of  knowledge  of  mass 
spectrometry  to  infer  constraints.  For  exanple,  it  may  infer  that  the 
unknown  molecule  is  probably  a ketone  but  definitely  not  a methyl- 
ketone,  Planning  information  like  this  is  put  on  the  generator's  lists 
of  good  and  bad  structural  features.  Planning  has  been  limited  almost 
entirely  to  mass  spectrometry,  but  the  same  techniques  can  be  used  with 
other  data  sources  as  well. 

The  KNDRAL  Planner  [28] , allows  for  cooperative  (man-machine) 
problem  solving  in  the  interpretation  of  mass  spectra.  It  uses  the 
chemist's  relevant  knowledge  of  mass  spectrometry  and  applies  it 
systematically  to  the  spectrum  of  an  unknown.  'Biat  is,  using  the 
chemist's  definitions  of  the  structural  skeleton  of  the  molecule  and 
the  relevant  fragmentation  rules,  the  program  does  the  bookkeeping  of 
associating  peaks  with  fragments  and  the  combinatorics  of  finding 
consistent  ways  of  placing  substituents  around  the  skeleton. 

The  output  from  the  DENDRAL  Planner  is  a list  of  structure 
descriptions  with  as  much  detail  filled  in  as  the  data  and  defined 
fra^ntations  will  allow.  Because  there  are  limits  to  the  degree  of 
refinement  allowed  by  mass  spectrometry  alone,  sets  of  atoms  are 
assigned  to  ^ts  of  skeletal  nodes.  Thus  the  task  of  fleshing  out  the 
plan  - specifying  possible  structures  assigned  to  specific  skeletal 
nodes  - is  left  to  CONGEN, 


3.1.3  The  Testing  and  Ranking  Programs 

The  programs  MSPRUNE  [61]  and  MSRANK  [59]  use  a large  amount  of 
knowledge  of  mass  spectrometry  to  make  testable  predictions  from  each 
plausible  candidate  molecule.  Predicted  data  are  compared  to  the  data 
from  the  unknown  compound  to  throw  out  some  cetndidates  and  rank  the 
others  [10,59,61]. 


MSPRUNE  works  with  (a)  a list  of  candidate  structures  from 
CCWGEN,  and  (b)  the  mass  spectrum  of  the  unknown  molecule.  It  uses  a 
fairly  simple  theory  of  mass  spectrometry  to  predict  commonly  expected 
fragmentations  for  each  candidate  structure.  Predictions  which  deviate 
greatly  from  the  observed  spectrum  are  considered  prima  facie  evidence 
of  incorrectness;  the  corresponding  structures  are  pruned  from  the 
list,  MSRANK  then  uses  more  subtle  rules  of  mass  spectrometry  to  reuik 
the  remaining  structures  according  to  the  number  of  predicted  peaks 
found  (and  not  found)  in  the  observed  data,  weighted  by  measures  of 
importance  of  the  processes  producing  those  peaks. 


3.2  Research  Results 


Tlie  Heuristic  DENDRAL  effort  has  shovm  that  it  is  possible  to 
vnrite  a computer  program  that  equals  the  performance  of  experts  in  some 
limited  areas  of  science.  Published  papers  on  the  program's  analysis 
of  alif^tic  ketones,  amines,  ethers,  ctlcohols,  thiols  cuv3  thioethers 
[15,19,20,22]  make  the  point  that  although  the  program  does  not  know 
more  than  an  expert  (and  in  fact  knows  far  less) , it  performs  well 
because  of  its  systematic  search  through  the  space  of  possibilities  and 
its  systematic  use  of  what  it  does  know.  A pc^r  on  the  program's 
einalysis  of  estrogenic  steroids  makes  the  point  that  the  program  can 
solve  structure  elucidation  problems  for  complex  organic  molecules  [28] 
of  current  biological  interest.  Another  paper  on  the  analysis  of  mass 
spectra  of  mixtures  of  estrogenic  steroids  (without  prior  separation) 
establishes  the  program's  ability  to  do  better  than  experts  on  some 
problems  [32].  With  mixtures,  the  program  succeeds,  and  people  fail, 
because  of  the  magnitude  of  the  task  of  correlating  data  points  with 
each  possible  fragmentation  of  each  possible  component  of  the  mixture. 
Several  articles  based  on  results  from  CONCXN  demonstrate  its  power  and 
utility  for  solving  current  research  problems  of  medical  and 
biochemiced.  importance  [42,48,50,53,62,58]. 


3.3  Human  Engineering 

A successful  applications  program  must  demonstrate  conqpetence , 
as  the  previous  section  en^Aiasized.  However,  it  is  also  necessary  to 
design  the  programs  to  achieve  accept^ility,  by  the  scientists  for 
whom  the  AI  system  is  written.  That  is,  without  proper  attention  to 
hunan  engineering,  and  similar  issues,  a complex  applications  program 
will  not  be  widely  used.  Besides  making  the  I/O  language  easy  for  the 
user  to  understand,  it  is  also  important  to  make  the  scope  and 
limitations  of  the  problem  solving  methods  known  to  the  user  as  much  as 
possible  [60] . 

The  features  designed  into  KMDRAL  programs  to  make  them  easier 
and  more  pleasant  to  use  include  graphical  drawings  of  chemical 
structures  [58],  a stylized,  but  easily  understood  language  of 
expressing  and  editing  chemical  constraints  [44] , on-line  help 
facilities  [60] , depth-first  problem  solving  to  produce  some  solutions 
quickly,  estimators  of  problem  size  and  (at  any  time)  amount  of  work 
remaining.  Documentation  and  user  manuals  are  written  at  many  levels 
of  detail.  And  one  of  our  staff  is  almost  always  available  for 
consultation  by  phone  or  message  [46] . 
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3.4 


Applications  of  CONGEN  to  Cheroiccil  Problems 


Many  persons  have  used  MNDRAL  programs  (mostly  CONGEN)  in  an 
experimental  mode.  Some  chemists  have  used  programs  on  the  SUMEX 
machine,  others  have  requested  help  by  mail,  and  a few  have  imported 
programs  to  their  own  computers. 

Copies  of  programs  have  been  distributed  to  chemists  requesting 
them.  However,  we  have  strongly  suggested  that  persons  access  the 
local  versions  by  TYMNET  to  minimize  the  number  of  different  versions 
we  maintain  and  to  avoid  the  need  for  rewriting  the  INTERLISP  code  for 
another  machine. 

Users  do  not  always  tell  us  about  the  problems  they  solve  using 
the  EENDRAL  programs.  To  some  extent  this  is  one  sign  of  a successful 
application.  The  list  below  thus  represents  only  a sampling  of  the 
chemical  problems  to  which  the  programs  have  been  applied.  CONGEN  is 
most  used,  although  other  DENDRAL  subprograms  have  been  used 
occasionally. 

Since  the  SUMEX  computer  is  available  over  the  TYMNET  network, 
it  is  possible  for  scientists  in  many  parts  of  the  world  to  access  the 
DENDRAL  programs  on  SUMEX  directly.  Many  scientists  interested  in 
using  DENDRAL  programs  in  their  own  work  are  not  located  neeu:  a network 
access  point,  however.  Tliese  chonists  use  the  mail  to  send  details  of 
their  structure  elucidation  problon  to  a DENDRAL  Project  collaborator 
at  Stanford. 

EENDRAL  programs  have  been  used  to  aid  in  structure 
determination  problems  of  the  following  kinds; 

terpenoid  natural  products  from  plant  and  marine  animal  sources 

marine  sterols 

organic  acids  in  human  urine  and  other  body  fluids 

photochanical  rearrangonent  products 

impurities  in  manufactured  chemicals 

conjugates  of  pesticides  with  sugars  and  amino  acids 

antibiotics 

metabolites  of  microorganisms 
insect  hormones  and  E^ieroiones 
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CGNGEN  was  also  applied  to  published  structure  elucidation 
problems  by  students  in  Prof.  Djerassi's  class  on  spectroscopic 
techniques  to  check  the  accuracy  and  completeness  of  the  published 
solutions.  For  several  cases,  the  program  found  structures  v^ich  were 
plausible  alternatives  to  the  published  structures  (based  on  a pr(*lem 
constraints  that  appeared  in  the  article) . Tliis  kind  of  information 
thus  serves  as  a valuable  check  on  conclusions  drawn  from  experimental 
data. 


4 META-DENDRAL 

Because  of  the  difficulty  of  extracting  dcxnain-specific  rules 
from  experts  for  use  by  DENDRAL,  a more  efficient  means  of  transferring 
knowledge  into  the  program  was  sought.  IVro  alternatives  to  "hand- 
crafting" each  new  knowledge  base  have  been  explored;  interactive 
knowledge  transfer  programs  and  automatic  theory  formation  programs. 
In  this  enterprise  the  separation  of  domain-specific  knowledge  from  the 
computer  programs  themselves  has  been  critical. 

One  of  the  stumbling  blocks  with  programs  for  the  interactive 
transfer  of  knowledge  is  that  for  sane  eureas  of  chemistry  there  are  no 
experts  with  enough  specific  knowledge  to  make  a high  performance 
problem  solving  program.  (See  [16] ) . It  is  desirable  to  avoid  forcing 
an  e;^rt  to  focus  on  original  data  in  order  to  codify  the  rules 
explaining  those  data  because  that  is  such  a time-consuming  process. 
For  these  reasons  an  effort  to  build  an  autanatic  rule  formation 
program  (called  Meta-DENDRAL)  was  initiated. 

The  MWDRAL  programs  are  structured  to  reed  their  task-specific 
knowledge  frar.  tables  of  production  rules  and  execute  the  rules  in  new 
situations,  under  rather  elaborate  control  structures.  Ihe  Meta- 
EENDRAL  programs  have  been  constructed  to  aid  in  building  the  knowledge 
base,  i.e,  the  tables  of  rules. 


4.1  The  Task 

The  present  Meta-DENDRAL  progremt  [51,  63]  interactively  helps 
chemists  determine  the  dependence  of  mass  spectroraetric  fragmentation 
on  substructural  features,  under  the  hypothesis  that  molecular 
fragmentations  are  related  to  topological  graph  structural  features  of 
molecules.  Our  goal  is  to  have  the  program  suggest  qucditative 
explanations  of  the  characteristic  fragmentations  and  rearrangements 
among  a set  of  molecules.  We  do  not  now  attempt  to  rationalize  all 
peaks  nor  find  qucuititative  assessments  of  the  extent  to  which  various 
processes  contribute  to  peak  intensities. 
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Ihe  progr2un  emulates  many  of  the  reasoning  processes  of  manual 
approaches  to  rule  discovery.  It  reasons  symbolically,  using  a modest 
amount  of  chemical  knowledge.  It  decides  which  data  points  are 
Important  and  looks  for  fr2igmentatlon  processes  that  will  explain  them. 
It  attempts  to  form  general  rules  by  correlating  plausible 
fragmentation  processes  with  substructural  features  of  the  molecules. 
Ihen,  as  a chemist  does,  the  program  tests  and  modifies  the  rules. 

Bach  I/O  pair  for  Meta-'DE34DRAL  Is:  (INFlir)  a chemlced  sample 
with  uniform  molecular  structure  (abbreviated  to  "a  structure") : 
(OlTTPOT)  one  X-Y  point  from  the  histogram  of  fragment  masses  and 
relative  abundances  of  fragments  (often  referred  to  eis  one  peak  In  the 
mass  spectrum) . 

Since  the  spectrum  of  each  structure  contains  100  to  300 
different  data  points,  each  structure  appears  In  many  I/O  pairs.  Hius, 
the  program  must  look  for  several  generating  principles,  or  processes, 
that  operate  on  a structure  to  produce  many  data  points.  In  addition, 
the  data  are  not  guaranteed  correct  because  these  are  empirical  data 
which  may  contain  noise  or  contributions  from  impurities  in  the 
original  sample.  As  a result,  the  program  does  not  attempt  to  explain 
every  I/O  pair.  It  does,  howler,  choose  which  data  points  to  explain 
on  the  basis  of  criteria  given  by  the  chemist  cis  part  of  the  imposed 
model  of  mass-spectrometry. 

Rules  of  mass  spectrometry  actually  used  by  chenists  are  often 
expressed  as  v^at  AI  scientists  would  call  production  rules.  Ihese 
rules  (vrtien  executed  by  a program)  constitute  a simulation  of  the 
fragmentation  and  atom  migration  processes  that  occur  inside  the 
instrument.  The  left-hand  side  is  a description  of  the  graph  structure 
of  some  relevant  piece  of  the  molecule.  The  right-hand  side  is  a list 
of  processes  v^lch  occur:  specifically,  bond  cleavages  emd  atom 
migrations.  For  example,  one  simple  rule  is 

(Rl)  N - C - C - C > N - C * C - C 

where  the  asterisk  indicates  breaking  the  bond  at  that  position  and 
recording  the  mass  of  the  freigment  to  the  left  of  the  asterisk.  (No 
migration  of  atoms  between  fraigments  is  predicted  by  this  rule.) 

Although  the  vocabulary  for  describing  individual  atoms  in 
subgraphs  is  small  and  the  gramnar  of  subgraphs  is  simple,  the  size  of 
the  subgraph  search  space  is  large.  In  addition  to  the  connectivity  of 
the  subgraph,  each  atom  in  the  subgraph  may  have  tp  to  four  (dependent) 
attributes  specified:  (a)  Atom  type  (e.g.,  Ccurbon) , (b)  Number  of 
connected  neighbors  (otter  than  hydrogen) , (c)  Number  of  hydrogen 
neighbors,  and  (d)  Number  of  doubly-bonded  neighbors.  The  size  of  the 
space  to  consider,  for  example,  for  subgraphs  containing  6 atoms,  each 
with  any  of  (say)  20  attribute-value  specifications,  is  20°  possible 
sutjgraphs. 
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The  language  o£  processes  (right-hand  sides  of  rules)  is  also 
sijnple  but  can  describe  many  combinations  of  «;tions:  one  or  more  bonds 
from  the  left-hand  side  may  break  and  zero  or  more  atoms  may  migrate 
bettieen  freigments. 


1 , 

I 

4.2  Method 

I ' ’ 

The  rule  formation  process  for  Neta-DENDRAL  is  a three-stage 
^ t sequence  similar  to  the  plan-qenerate-test  sequence  used  in  Heuristic 

DBNDRAL.  In  Neta-OENDRAL,  the  generator  (RU1£GEN) , described  in  section 
\ ' 4.2.2  below,  generates  plausible  rules  within  syntactic  and  semantic 

\ constraints  and  within  desired  limits  of  evidential  support.  The  model 

I used  to  guide  the  generation  of  rules  is  particularly  in^rtant  since 

I ' the  space  of  rules  is  very  large.  The  model  of  mass  spectrometry  in 

I the  program  is  highly  flexible  and  can  be  modified  by  the  user  to  suit 

j his  own  biases  and  assunpticxis  about  the  kinds  of  rules  that  are 

' appropriate  for  the  compounds  under  consideration.  The  model 

determines  (i)  the  vocabulary  to  be  used  in  constructing  rules,  (ii) 
the  syntax  of  the  rules  (as  before,  the  left-hand  side  of  a rule 
describes  a chemical  gra^,  the  right-hand  side  describes  a 
. fragmentation  and/or  rearrangement  process  to  be  expected  in  the  mass 

* spectrometer) , (iii)  some  semctntic  constraints  governing  the 

plausibility  of  rules.  For  example,  the  chemist  can  use  a subset  of 
the  terms  available  for  describing  chemiccd  graphs  and  can  restrict  the 
nunber  of  chemical  atoms  described  in  the  left-hand  sides  of  rules  and 
can  restrict  the  con{>lexity  of  processes  considered  in  the  right-hand 
sides  [63]. 

The  planning  part  of  the  program  (INTSUH),  described  in  4.2.1, 
collects  and  sunmarizes  the  evidential  support.  The  testing  part 
(RUI£MOD) , described  in  4.2.3,  looks  for  counterexamples  to  rules  and 
makes  modifications  to  the  rules  in  order  to  increase  their  generality 
and  simplicity  and  to  decrease  the  total  number  of  rules.  These  three 
major  oonqponents  ue  discussed  briefly  in  the  following  subsections. 


4.2.1  Interpret  Data  as  Evidence  for  Processes 

The  INTSUM  program  [33]  (named  for  data  interpretation  and 
simnary)  interprets  spectral  data  of  known  compounds  in  terms  of 
possible  fragmentations  and  atom  migrations.  For  each  molecule  in  a 
given  set,  INISUH  first  produces  the  plausible  processes  which  might 
occur,  ^.e. , breaks  and  combinaticxis  of  breaks,  with  and  without  atcm 
migrations.  These  processes  are  associated  with  ^lecific  bonds  in  a 
portion  of  molecular  structure,  or  skeletcm,  that  is  chosen  because  it 
is  common  to  the  molecules  in  the  given  set.  Then  INTSIM  examines  the 
spectra  of  the  molecules  looking  for  evidence  (spectral  peaks)  for  each 
process. 
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Notice  that  the  association 


of  processes  with  data  points  may 
CHt 

spectral  peak  at  mass  29  may  be  attributed  to  a‘process' 
either  the  second  bond  from  the  left  or  <Mie  v4ilch  breaks 
bond  from  the  right,  both  producing  CH3-CH2  fragments. 


be  ambiguous.  For  instance,  in  the  molkrule  CH3-CH2-CH2-NH-CH2-CH3  a 

process  vAiich  breaks 
the  second 


4.2.2  Generate  Candidate  Rules 

After  the  data  have  been  inter{»:eted  by  nnsUM,  control  passes 
to  a heuristic  search  program  known  as  RUUX3EN  [51] , for  rule 
generation.  RUIEGEN  creates  general  rules  by  selecting  "important" 
features  of  the  molecular  structure  euround  the  site  of  the 
fragmentations  proposed  by  INTSIM.  Ihese  important  features  are 
combined  to  form  a subgraph  description  of  the  local  environment 
surrounding  the  broken  bcwxJs.  Each  subgraph  considered  becomes  the 
left  hand  side  of  a candidate  rule  v4x>se  right  hand  side  is  INTSUM's 
proposed  process.  Essentially  RULEGEN  searches  (within  the 
constraints)  through  a space  of  these  subgraph  descriptions  looking  for 
successively  more  specific  subgraphs  that  are  supported  by  successively 
"better"  sets  of  evidence. 

Conceptually,  the  program  begins  with  the  most  general 
candidate  rule,  X*X  (v^re  X is  any  unspecified  atom  and  where  the 
asterisk  is  used  to  indicate  the  broken  bond,  with  the  detected 
fragment  written  to  the  left  of  the  asterisk) . Since  the  most  useful 
rules  lie  somewhere  between  the  overly-general  candidate,  X*X,  and  the 
overly-specific  complete  molecular  structure  descriptions  (with 
specified  bonds  breaking) , the  program  generates  refined  descriptions 
by  successively  specifying  additional  features.  This  is  a coarse 
search;  for  efficiency  reasons  RUI£GEN  sometimes  adds  features  to 
several  nodes  at  a time,  without  considering  the  intermediate 
subgraphs. 

The  program  systematically  adds  features  (attribute-value 
pairs)  to  subgraphs,  starting  with  the  subgraph  X*X,  and  always  making 
each  successor  more  specific  than  its  parent.  (Itecall  that  each  node 
can  be  described  with  any  or  all  of  the  following  attributes:  atom 
type,  number  of  non-hydrogen  neighbors,  number  of  hydrogen  neighbors, 
and  number  of  doubly  bonded  neighbors).  Working  outw2ird,  the  program 
assigns  one  attribute  at  a time  to  all  atoms  that  are  the  same  number 
of  atoms  away  from  the  breaking  bond.  Each  of  the  four  attributes  is 
considered  in  turn,  and  each  attribute  value  for  which  there  is 
supporting  evidence  generates  a new  successor.  Although  different 
values  for  the  same  attribute  may  be  assigned  to  each  atom  at  a given 
distance  from  the  breaking  bond,  the  coarseness  of  the  search  prevents 
examination  of  subgraphs  in  which  this  attribute  is  totally  unimportant 
on  some  of  these  atoms. 
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4,2.3  Refine  and  Test  the  Rules 


The  last  phase  of  Heta-KNUAL  (called  RUE£NOO)  [51]  evaluates 
the  plausible  rules  generated  by  RULEGEN  and  modifies  then  by  making 
them  more  generad  or  more  specific.  In  oontreist  to  RUIEGEN,  RUIJ3M0D 
considers  negative  evidence  (incorrect  {vedlctions)  of  rules  in  order 
to  increase  the  zKxsurKy  of  the  rule's  applications  within  the  training 
set.  While  RUIEGEM  performs  a coarse  search  of  the  rule  space  for 
reasons  of  efficiency,  RUIEHOD  performs  a localized,  fine  search  to 
refine  the  rules. 

R0LEM3D  will  typically  output  a set  of  5 to  10  rules  covering 
substantially  the  same  training  data  points  eis  the  ii^t  RULBGEN  set  of 
approximately  25  to  100  rules,  but  with  fewer  incorrect  predictions. 
This  program  is  written  as  a set  of  five  tasks,  correspoiding  to  the 
five  points  below. 

Selecting  a Subset  of  Important  Rules.  The  local  evaluation  in 
RUIEGEN  has  Ignored  negative  ^idence  and  has  not  discovered  that 
different  RUIEGBN  pathways  may  yield  rules  which  are  different  but 
explain  nmny  of  the  same  data  points.  Thus  there  is  often  a high 
degree  of  overlap  in  those  rules  and  they  may  make  many  incorrect 
predictions.  The  initial  selection  removes  most  of  the  redundancy  in 
the  rule  set. 

Merging  R^es.  Fot  any  subset  of  rules  which  explain  many  of 
the  same  data  points,  the  program  attempts  to  find  a slightly  more 
general  rule  that  (a)  includes  all  the  evidence  covered  by  the 
overlapping  rules  and  (b)  does  not  bring  in  extra  negative  evidence. 
If  it  can  find  such  a rule,  the  overlapping  rules  are  replaced  by  the 
single  compact  rule. 

Deleting  NMative  Evidence  ^ Making  Riles  More  Sp^ific. 
RUIEMOD  tries  to  add  attribute-value  specifications  to  atoms in  each 
rule  in  order  to  delete  some  negative  evidence  «hile  keeping  all  of  the 
positive  evidence.  This  involves  local  search  of  the  possible 
additions  to  the  subgraph  descriptions  that  were  not  considered  by 
RUIEGEN.  Because  of  the  coarseness  of  RQIEGEN's  search,  some  ways  of 
refining  rules  are  not  tried,  except  by  RUUEMOD. 


Making  Rules  More  General.  RUIEGEN  often  forms  rules  that  are 
more  specific  than  they  need  to  be.  Thus  ROLEHOD  seeks  a more  general 
form  that  covers  the  same  (and  perhaps  new)  data  points  without 
introducing  new  negative  evidence. 

Selecting  the  Final  Rile  Set,  The  selection  procedure  applied 
at  the  beginning  oFIruiemOD  is  applied  again  at  the  very  end  of  RUIENOD 
in  order  to  remove  redundancies  that  might  have  been  introduced  during 
generalization  and  specialization. 
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4,3  Meta-reNEa»L  Results 


One  measure  of  the  proficiency  of  Meta^XlOSU^  is  the  ability 
of  the  corresponding  performance  program  to  E^redict  correct  spectra  of 
new  molecules  using  the  learned  rules.  One  of  the  DBNIXV^  performance 
programs  ranks  a list  of  plausible  hypotheses  (candidate  molecules) 
according  to  the  similarity  of  their  predictions  (predicted  ^tectra)  to 
observed  data.  Hie  rank  of  the  correct  hypothesis  (i.e.  the  molecule 
actually  associated  with  the  observed  spectrum)  provides  a quantitative 
measure  of  the  "discriminatory  power"  of  the  rule  set. 

Hie  Meta-DENDRAL  program  has  successfully  rediscovered  known, 
published  rules  of  mass  spectrometry  for  two  classes  of  molecules. 
More  importantly,  it  has  discovered  new  rules  for  three  closely  related 
families  of  structures  for  which  rules  had  not  previously  been 
reported.  Meta-KSTORAL's  rules  for  these  classes  have  been  published 
in  the  chemistry  literature  [51].  Evaluations  of  all  five  sets  of  rules 
are  discussed  in  that  publication. 

Recently  Meta-DENDRAL  has  been  ad^>ted  to  a second 
spectroscopic  technique,  13C-nuclear  maignetic  resonance  (13C-NHR) 
spectroscopy  [62,64],  Hiis  new  version  provides  the  opportunity  to 
direct  the  induction  machinery  of  Meta-DENDRAL  under  a model  of  13C-NMR 
spectroscopy.  It  generates  rules  which  associate  the  resonance 
frequency  of  a carbon  atom  in  a magnetic  field  with  the  local 
structural  environment  of  the  atom.  IX-NMR  rules  have  been  generated 
euid  used  in  a candidate  molecule  ranking  program  similar  to  the  one 
described  above.  13C-NMR  rules  formulated  by  the  program  for  two 
classes  of  structures  have  been  successfully  used  to  identify  the 
spectra  of  additional  molecules  (of  the  same  classes,  but  outside  the 
set  of  training  data  used  in  generating  the  rules) . 

Hie  quality  of  rules  produced  by  Meta-I£NDRAL  has  been  assessed 
by  (a)  obtaining  agreement  from  mass  spectroscopists  that  they  are 
reasonable  explanations  of  the  training  data  and  provide  acceptable 
predictions  for  new  data,  and  (b)  testing  them  as  discriminators  of 
structures  outside  the  training  set.  Hie  question  of  agreement  on 
previously  characterized  sets  of  molecules  is  relatively  easy,  since 
the  chemist  only  needs  to  compare  the  program's  rules  and  Enredictions 
against  published  rules  and  spectra.  Agreement  has  been  high  on  test 
sets  of  amines,  estrogenic  steroids,  and  aromatic  acids.  On  new  data, 
however,  the  chemist  is  forced  into  SEX>t  checks.  For  example,  analyses 
of  some  individual  androstane  sEiectra  from  the  literature  were  used  as 
spot  checks  on  the  E>togram's  analysis  of  the  collections  of  androstane 
SEiectra. 


Hie  discrimination  test  is  to  determine  how  well  a set  of  rules 
allows  discrimination  of  known  structures  from  alternatives  on  the 
basis  of  comE>aring  predicted  and  actual  spectra.  For  example,  given  a 
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list  of  structures  (SI,  Sn)  and  the  mass  spectrum  for  structure 
SI,  can  the  rules  predict  a spectrum  for  SI  which  matd>es  the  given 
spectrum  (for  SI)  better  than  spectra  pc^i^cted  for  S2'*Sn  match  the 
given  £f)ectnin.  When  this  test  is  repeated  tor  eadi  available  qpectnin 
for  structures  Sl-Sh,  the  discriminatory  power  of  the  rules  is 
determined,  ihe  program  has  found  rules  with  high  discriminatory  power 
[51] , but  much  «K>rk  remains  before  we  standardize  on  nAiat  we  consider 
an  o^imum  mix  of  generality  and  discriminatory  power  in  rules. 


4.3.1  Transfer  to  Applications  Problems 

The  INTSUN  program  has  begun  to  receive  attention  from  chemists 
outside  the  Stanford  community,  but  so  far  there  have  been  only 
inquiries  about  outside  use  of  the  rest  of  Meta-KNDRAL.  INTSUM 
provides  careful  assistance  in  associating  plausible  explanations  with 
data  points,  within  the  chemist's  own  definition  of  "plausible".  This 
can  save  a person  many  hours,  even  weeks,  of  looking  at  the  data  under 
various  assumptions  about  fragmentation  patterns. 

The  uses  of  INTSUM  have  been  to  investigate  the  mass  spectral 
fragmentations  of  progesterones  [54,55] , marine  sterols  and  antibiotics 
[in  ivogress] , 


5 PROBLEMS 

The  science  of  AI  suffers  from  the  absence  of  satellite 
engineering  firms  that  can  map  research  programs  into  marketable 
products.  We  have  sought  alternatives  to  developing  OCMGEH  ourselves 
into  a program  that  is  widely  available  and  have  concluded  that  the 
time  is  not  yet  ripe  for  a transfer  of  re^nsibility.  In  the  future 
we  hope  for  two  major  developments  to  facilitate  dissemination  of  large 
AI  programs:  (a)  off-the-shelf,  small  (and  ^Hreferably  cheap)  computers 
that  run  advanced  symbol  manipulating  languages,  especially  INTEBLISP, 
and  (b)  software  firms  that  specialize  in  rewriting  AI  applications 
programs  to  industrial  specifications. 

While  the  software  is  almost  too  complex  to  export,  our 
research-oriented  computer  facility  has  too  little  capacity  for  in^rt. 
Support  of  an  extensive  body  of  outside  users  means  that  resources 
(people  as  well  as  computers)  must  be  diverted  from  the  research  goals 
of  the  project. 

At  considerable  cost  in  money  and  talent,  it  has  been  possible 
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to  export  the  programs  to  Edinburgh  . But  such  extensive  and 
expensive  collaborations  for  technology  transfer  are  almost  never  done 
in  AI,  Even  when  the  software  is  rewritten  for  export,  there  are  too 
few  "computational  chemists"  trained  to  manage  and  maintain  the 
programs  at  local  sites. 


6 CXMPOTERS  AND  lANGOAGES 

Bie  DENI»AL  programs  are  coded  largely  in  INTERLISP  and  run  on 
the  TXC  KI-10  system  under  the  TENEX  operating  system  at  the  SUMEX 
computer  resource  at  Stanford.  Parts  of  CONGEN  are  wit  ten  in 
and  SAIL  includii  3 some  I/O  packages  and  gra^^  manipulation  packages. 
We  are  currently  studying  the  question  of  rewriting  CXWGEN  in  a less 
flexible  language  in  order  to  run  the  program  cmi  a variety  of  machines 
with  less  power  and  memory.  Peripheral  programs  for  data  acquisition, 
data  filtering,  library  search  and  plotting  exist  for  chemists  to  use 
on  a DEC  PDP  11/45  system,  but  eu:e  coupled  to  the  AI  programs  only  by 
file  transfer. 
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7 CONCmSION 

CONGEN  has  attracted  a moderately  large  following  of  chemists 
vto  consult  it  for  help  with  structure  elucidation  problems.  mrSUM, 
too,  is  used  occasionally  by  persons  collecting  and  codifying  a large 
number  of  mass  spectra. 

With  the  exceptions  just  noted,  the  DENDRAL  and  Meta-DENDRAL 
programs  are  not  used  outside  the  Stanford  University  community  and 
thus  they  represent  only  a successful  demonstration  of  scientific 
capability.  These  programs  are  among  the  first  AI  programs  to  do  even 
this.  The  achievement  is  significant  in  that  the  task  domain  was  not 
"smoothed"  or  "tailored"  to  fit  existing  AI  techniques.  On  the 
contrary,  the  intrinsic  complexity  of  structure  elucidation  problems 
guided  the  AI  reseeurch  to  problems  of  knowledge  acquisition  and 
management  that  might  otherwise  have  been  ignored. 

The  DENDRAL  publications  in  major  chemical  journads  have 
introduced  to  chemists  the  term  "artificial  intelligence"  along  with  AI 
concepts  and  methods.  The  large  number  of  publications  in  the 
chemistry  literature  also  indicates  substantial  and  continued  interest 
in  DENDRAL  programs  and  2pplications. 

R.  Carhart  is  working  with  Prof.  Donald  Michie's  group  to 
bring  ip  a version  of  CXDNGBN  there. 
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